Stochastic Search In Changing Situations

نویسندگان

Abbas Abdolmaleki

David Simões

Nuno Lau

Luis Paulo Reis

Bob Price

Gerhard Neumann

چکیده

Stochastic search algorithms are black-box optimizer of an objective function. They have recently gained a lot of attention in operations research, machine learning and policy search of robot motor skills due to their ease of use and their generality. However, when the task or objective function slightly changes, many stochastic search algorithms require complete re-learning in order to adapt thesolution to the new objective function or the new context. As such, we consider the contextual stochastic search paradigm. Here, we want to find good parameter vectors for multiple related tasks, where each task is described by a continuous context vector. Hence, the objective function might change slightly for each parameter vector evaluation. In this paper, we investigate a contextual stochastic search algorithm known as Contextual Relative Entropy Policy Search (CREPS), an information-theoretic algorithm that can learn from multiple tasks simultaneously. We show the application of CREPS for simulated robotic tasks. Introduction Stochastic search algorithms are gradient-free black-box optimizers of some performance function dependent on a highdimensional parameter vector. They directly evaluate the execution of a parameter vector by using the return of an episode. Stochastic search algorithms (Hansen et al. 2003; Sun et al. 2009; Stulp and Sigaud 2012; Rückstieß et al. 2008) typically maintain a search distribution over the parameters that we want to optimise, which is used to create samples of the parameter vector. Subsequently, the performance of the sampled parameters is evaluated. Using the samples and their evaluations, a new search distribution is computed by computing gradient based updates (Sun et al. 2009; Rückstieß et al. 2008), evolutionary strategies (Hansen et al. 2003), the cross-entropy method (Mannor et al. 2003), path integrals (Stulp and Sigaud 2012; Theodorou et al. 2010), or information-theoretic policy updates (Kupcsik et al. 2013; Abdolmaleki et al. 2015a). However, many of the previously mentioned algorithms cannot be applied to multi-task learning. In other words, if the task setup or objective function changes slightly, relearning is needed to adapt the solution to the new situation or the new context. For example, consider optimisCopyright c 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. ing the parameters of a humanoid robot controller to kick a ball. Once the characteristics of the ball, such as weight or material, or objective function, such as desired kick distance, change, re-learning is needed. One could independently optimize for several target contexts in order to generalize a task, for example optimizing to kick the ball for different distances(context). Subsequently, when a new unseen context is presented, the optimized contexts can be generalized through regression methods (Niehaus et al. 2007; Wang et al. 2009). However now optimizing for different contexts and then generalizing between the optimized parameters for different unseen contexts are two independent processes. Therefore, even though such approaches have been used successfully, they are time consuming as well as inefficient in terms of the number of needed training samples. In other words, we cannot reuse data-points obtained from optimizing a task with context s to improve and accelerate the optimization of a task with context s0. As such, it is desirable to learn the selection of the parameters for multiple tasks at once without restarting the learning process once we see a new task. This problem setup is also known as contextual policy search (Kupcsik et al. 2013; Kober et al. 2010). Recently, such multi-task learning capability was established for information-theoretic policy search algorithms (Peters et al. 2010), such as the episodic Contextual Relative Entropy Policy Search (CREPS) algorithm (Daniel et al. 2012; Kupcsik et al. 2013). In (Abdolmaleki et al. 2015c), CREPS was successfully used to optimize a walking controller for different speeds. Despite its advantages, CREPS has a major set-back that does not allow it to perform favourably. Like many other stochastic search algorithms, CREPS maintains a Gaussian search distribution and it updates the mean and covariance matrix of its search distribution iteratively. However due to the covariance matrix update rule of CREPS, we will show that, search distribution might collapse prematurely to a point-estimate before finding a good solution, resulting in a premature convergence which is highly undesirable. Although, this multi-task learning capability is not found in other stochastic search algorithms (Hansen et al. 2003; Sun et al. 2009), such as CMA-ES and NES, or commonly used policy search methods (Stulp and Sigaud 2012; Kober and Peters 2010), they typically don’t suffer from premature convergence. Therefore, to solve premature conPRELIMINARY VERSION: DO NOT CITE

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using a new modified harmony search algorithm to solve multi-objective reactive power dispatch in deterministic and stochastic models

The optimal reactive power dispatch (ORPD) is a very important problem aspect of power system planning and is a highly nonlinear, non-convex optimization problem because consist of both continuous and discrete control variables. Since the power system has inherent uncertainty, hereby, this paper presents both of the deterministic and stochastic models for ORPD problem in multi objective and sin...

متن کامل

A Combined Stochastic Programming and Robust Optimization Approach for Location-Routing Problem and Solving it via Variable Neighborhood Search algorithm

The location-routing problem is one of the combined problems in the area of supply chain management that simultaneously make decisions related to location of depots and routing of the vehicles. In this paper, the single-depot capacitated location-routing problem under uncertainty is presented. The problem aims to ﬁnd the optimal location of a single depot and the routing of vehicles to serve th...

متن کامل

Solving a Stochastic Cellular Manufacturing Model by Using Genetic Algorithms

This paper presents a mathematical model for designing cellular manufacturing systems (CMSs) solved by genetic algorithms. This model assumes a dynamic production, a stochastic demand, routing flexibility, and machine flexibility. CMS is an application of group technology (GT) for clustering parts and machines by means of their operational and / or apparent form similarity in different aspects ...

متن کامل

Impact of Capacity of Mobile Units on Blood Supply Chain Performance: Results from a Robust Analysis

Background and Objectives: A sudden jump in blood demand during natural disasters may have strong negative impact on the performance of blood supply chain. Appropriate response to emergency situations requires predictive approach to determining the optimal allocation of blood supply chain resources for various disaster scenarios. The present study, thus, presents an optimization model aimed at ...

متن کامل

Non-linear stochastic inversion of 2D gravity data using evolution strategy (ES)

In the current work, a 2D non-linear inverse problem of gravity data is solved using the evolution strategies (ES) to find the thickness of a sedimentary layer in a deep-water situation where a thick sedimentary layer usually exists. Such problems are widely encountered in the early stages of petroleum explorations where potential field data are used to find an initial estimate of the basin geo...

متن کامل

A stochastic network design of bulky waste recycling – a hybrid harmony search approach based on sample approximation

Facing supply uncertainty of bulky wastes, the capacitated multi-product stochastic network design model for bulky waste recycling is proposed in this paper. The objective of this model is to minimize the first-stage total fixed costs and the expected value of the second-stage variable costs. The possibility of operation costs and transportation costs for bulky waste recycling is considered ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Stochastic Search In Changing Situations

نویسندگان

چکیده

منابع مشابه

Using a new modified harmony search algorithm to solve multi-objective reactive power dispatch in deterministic and stochastic models

A Combined Stochastic Programming and Robust Optimization Approach for Location-Routing Problem and Solving it via Variable Neighborhood Search algorithm

Solving a Stochastic Cellular Manufacturing Model by Using Genetic Algorithms

Impact of Capacity of Mobile Units on Blood Supply Chain Performance: Results from a Robust Analysis

Non-linear stochastic inversion of 2D gravity data using evolution strategy (ES)

A stochastic network design of bulky waste recycling – a hybrid harmony search approach based on sample approximation

عنوان ژورنال:

اشتراک گذاری